HDFS scalability: the limits to growth
نویسنده
چکیده
Konstantin V. Shvachko is a principal software engineer at Yahoo!, where he develops HDFS. He specializes in efficient data structures and algorithms for large-scale distributed storage systems. He discovered a new type of balanced trees, S-trees, for optimal indexing of unstructured data, and he was a primary developer of an S-tree-based Linux file system, treeFS, a prototype of reiserFS. Konstantin holds a Ph.D. in computer science from Moscow State University, Russia. He is also a member of the Project Management Committee for Apache Hadoop.
منابع مشابه
Ceph as a scalable alternative to the Hadoop Distributed File System
[email protected] THE HADOOP D I S TR I BUTED F I L E System (HDFS) has a single metadata server that sets a hard limit on its maximum size. Ceph, a high-performance distributed file system under development since 2005 and now supported in Linux, bypasses the scaling limits of HDFS. We describe Ceph and its elements and provide instructions for installing a demonstration system that can be used...
متن کاملHigh Scalability of HDFS using Distributed Namespace
In data intensive computing, Hadoop is widely used by organizations. The client applications of Hadoop require high availability and scalability of the system. Mostly, these applications are online and their data growth rate is unpredictable. The present Hadoop relies on secondary namenode for failover which slows down the performance of the system. Hadoop system’s scalability depends on the ve...
متن کاملUsing Hadoop as a Grid Storage Element
Hadoop is an open-source data processing framework that includes a scalable, faulttolerant distributed file system, HDFS. Although HDFS was designed to work in conjunction with Hadoop’s job scheduler, we have re-purposed it to serve as a grid storage element by adding GridFTP and SRM servers. We have tested the system thoroughly in order to understand its scalability and fault tolerance. The tu...
متن کاملHopsFS: Scaling Hierarchical File System Metadata Using NewSQL Databases
Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS’ sing...
متن کاملA Novel Approach for Improving Security and Storage Efficiency on HDFS
Distributed file system for the storage of massive files have obvious advantages compared with the conventional file system. For instance, Hadoop Distributed File System (HDFS) implemented with commodity hardware has the advantages of low cost, high fault tolerance, scalability, etc. However, HDFS has the potential safety hazard due to the unencrypted data stored in Datanode, which may cause da...
متن کامل